feat: Add phase_status tracking and PARTIAL apply status by bjcoombs · Pull Request #1690 · meridianhub/meridian

bjcoombs · 2026-03-16T09:55:59Z

Summary

Add per-phase execution status tracking to manifest versions for partial failure recovery
When a manifest apply fails mid-execution, the system records which phases completed, failed, or were skipped
New PARTIAL apply status distinguishes partial failures (some phases succeeded) from complete failures

Changes

Migrations (CockroachDB-safe)

20260316000002_add_phase_status.sql: Add phase_status JSONB column to manifest_versions
20260316000003_add_partial_apply_status.sql: Alter CHECK constraint to include PARTIAL (separate migration for CockroachDB compatibility)

Proto

PhaseStatusDetail message with status, started_at, completed_at, error fields
phase_status map field on ManifestVersion and ApplyManifestResponse
APPLY_STATUS_PARTIAL and APPLY_MANIFEST_STATUS_PARTIAL enum values

Domain (manifest package)

PhaseStatusMap, PhaseStatusEntry, PhaseExecutionStatus types
GetPhaseStatus() / SetPhaseStatus() on VersionEntity for JSON round-tripping
UpdatePhaseStatus() repository method
StoreManifestVersionWithPhaseStatus() on HistoryService
ApplyStatusPartial constant

gRPC Handler (applier package)

buildInitialPhaseStatus() derives pending phase entries from execution plan
updatePhaseStatus() marks phases as COMPLETED/FAILED/SKIPPED based on saga step results
classifyFailure() distinguishes partial vs complete failure
Phase status included in both failure and success response paths

Task Master

045-manifest-source-of-truth.11 - Add phase_status and PARTIAL apply status

Test plan

Unit tests for classifyFailure (nil, all-failed, partial)
Unit tests for buildInitialPhaseStatus from execution plan
Unit tests for updatePhaseStatus (all success, partial failure)
Unit tests for findFailedPhase (no result, with result)
Unit tests for phaseStatusMapToResponseProto (nil, populated)
Unit tests for VersionEntity.GetPhaseStatus/SetPhaseStatus round-trip
Integration test: PARTIAL apply status persists to CockroachDB
Integration test: UpdatePhaseStatus JSONB column round-trip
All existing applier and manifest tests pass

Add per-phase execution status tracking to manifest versions for partial failure recovery. When a manifest apply fails mid-execution, the system now records which phases completed, which failed, and which were skipped. Changes: - Migration: Add phase_status JSONB column to manifest_versions - Migration: Add PARTIAL to apply_status CHECK constraint (separate file for CockroachDB compatibility) - Domain: PhaseStatusMap, PhaseStatusEntry types with JSON serialization - Repository: UpdatePhaseStatus method, GetPhaseStatus/SetPhaseStatus on VersionEntity - Proto: PhaseStatusDetail message, phase_status map on ManifestVersion and ApplyManifestResponse, APPLY_STATUS_PARTIAL and APPLY_MANIFEST_STATUS_PARTIAL enum values - gRPC handler: Derives phase status from execution plan phases and saga step results, classifies partial vs complete failure - Tests: Unit tests for classification, phase status building, and round-trip serialization; integration tests for DB persistence

coderabbitai · 2026-03-16T09:56:22Z

📝 Walkthrough

Walkthrough

This change introduces phase-level status tracking for manifest application: proto additions for phase details and partial status, runtime phase tracking and classification in the applier, persistence of phase status in history/repository, DB migrations for a JSONB phase_status column and PARTIAL apply state, and tests covering the new behavior.

Changes

Cohort / File(s)	Summary
Protobufs `api/proto/meridian/control_plane/v1/apply_manifest_service.proto`, `api/proto/meridian/control_plane/v1/manifest_history_service.proto`	Added `phase_status` map fields and `PhaseStatusDetail` message; introduced `APPLY_MANIFEST_STATUS_PARTIAL` / `APPLY_STATUS_PARTIAL` enum values.
Applier runtime `services/control-plane/internal/applier/grpc_handler.go`, `services/control-plane/internal/applier/grpc_handler_test.go`	Added phase-status lifecycle: initialization, updating from execution results, failure classification (partial vs full), conversion to proto, history recording with phase status; tests for the new helpers and flows.
History & repository `services/control-plane/internal/manifest/history.go`, `services/control-plane/internal/manifest/repository.go`, `services/control-plane/internal/manifest/repository_test.go`	Added PhaseExecutionStatus/PhaseStatusEntry/PhaseStatusMap types; VersionEntity.PhaseStatus JSONB field with Get/Set; UpdatePhaseStatus repo method; StoreManifestVersionWithPhaseStatus and proto conversion; tests and test DDL updated for phase_status and PARTIAL apply status.
Migrations `services/control-plane/migrations/20260316000002_add_phase_status.sql`, `services/control-plane/migrations/20260316000003_add_partial_apply_status.sql`, `services/control-plane/migrations/20260316000004_add_partial_apply_status_constraint.sql`	Added `phase_status` JSONB column and split constraint update migrations to add PARTIAL to the `apply_status` CHECK constraint due to DB transactional limits.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Handler as ApplyManifestHandler
    participant Planner as ExecutionPlanner
    participant Executor as Executor
    participant StatusMgr as PhaseStatusManager
    participant History as HistoryService
    participant DB as Database

    Client->>Handler: ApplyManifest(request)
    Handler->>Planner: build execution plan
    Handler->>Handler: buildInitialPhaseStatus(plan)
    Handler->>Executor: Execute plan
    Executor-->>Handler: ApplyManifestResult (steps success/fail)
    Handler->>StatusMgr: updatePhaseStatus(plan, result, execErr)
    alt Partial failure detected
        StatusMgr->>StatusMgr: findFailedPhase(...)
        StatusMgr->>StatusMgr: mark phases COMPLETED/FAILED/SKIPPED, set timestamps/errors
    else All success
        StatusMgr->>StatusMgr: mark all phases COMPLETED
    end
    StatusMgr-->>Handler: phaseStatus map
    Handler->>Handler: classifyFailure(phaseStatus) -> (ApplyStatus, ManifestStatus)
    Handler->>History: recordHistoryWithPhaseStatus(manifest, phaseStatus, ...)
    History->>DB: Store manifest version with phase_status JSONB
    DB-->>History: persist result
    History-->>Handler: stored ManifestVersion proto
    Handler->>Handler: phaseStatusMapToResponseProto(phaseStatus)
    Handler-->>Client: ApplyManifestResponse (status + phase_status)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the primary changes: adding phase_status tracking and introducing PARTIAL apply status for partial failure recovery.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, covering migrations, proto changes, domain types, gRPC handler logic, and test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 045-manifest-source-of-truth--11--phase-status-partial

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

services/control-plane/internal/manifest/repository.go (1)

79-105: Keep UpdatePhaseStatus on the typed PhaseStatusMap API.

UpdatePhaseStatus re-exposes raw JSON even though SetPhaseStatus already owns serialization in this package. Keeping the repository API typed avoids duplicated marshaling and malformed writes leaking into callers.

Suggested shape

-func (r *Repository) UpdatePhaseStatus(ctx context.Context, id uuid.UUID, phaseStatusJSON *string) error {
+func (r *Repository) UpdatePhaseStatus(ctx context.Context, id uuid.UUID, phaseStatus PhaseStatusMap) error {
+	var entity VersionEntity
+	if err := entity.SetPhaseStatus(phaseStatus); err != nil {
+		return err
+	}
 	return db.WithGormTenantTransaction(ctx, r.db, func(tx *gorm.DB) error {
-		result := tx.Model(&VersionEntity{}).Where("id = ?", id).Update("phase_status", phaseStatusJSON)
+		result := tx.Model(&VersionEntity{}).Where("id = ?", id).Update("phase_status", entity.PhaseStatus)
 		if result.Error != nil {
 			return fmt.Errorf("update phase_status: %w", result.Error)
 		}

Also applies to: 264-276

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/manifest/repository.go` around lines 79 -
105, The UpdatePhaseStatus method should operate on the typed PhaseStatusMap
rather than raw JSON: change UpdatePhaseStatus to accept/return PhaseStatusMap
(not []byte/string), call VersionEntity.GetPhaseStatus to read current map,
mutate it, then use VersionEntity.SetPhaseStatus to serialize and store it,
avoiding direct json.Marshal/json.Unmarshal in the repo layer; update any
callers to pass a PhaseStatusMap and remove any duplicated marshaling logic
(also apply the same change to the other occurrence referenced around lines
264-276).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/control-plane/internal/applier/executor.go`:
- Around line 210-211: The comment on ApplyManifestResult.Status advertises a
"partial" status that Apply never returns; update the documentation for
ApplyManifestResult.Status (and any related comment blocks) to only list the
statuses actually produced by the Apply function (e.g., "applied" and "failed"),
or explicitly note that "partial" is produced elsewhere (by phase analysis) if
you prefer to keep it — locate the ApplyManifestResult type and the Apply
function references to make the change so the docstring accurately matches
runtime behavior.

In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 526-559: The current logic marks the entire failed phase as FAILED
even if earlier calls in that same phase succeeded; modify the update that runs
when failedPhase == p (in the loop that updates phaseStatus using
findFailedPhase, and the analogous block around
classifyFailure/updatePhaseStatus later) to inspect the step-level results
(e.g., result.StepResults or whatever structure holds per-step statuses) for the
failedPhase: if any step in that phase succeeded, treat the phase as having
partial work — set the phase status to manifest.PhaseStatusCompleted (or to an
existing "partial" status if your manifest has one), set CompletedAt = &now and
populate Error from result.Error or execErr as currently done; only mark it
strictly FAILED when no step in that phase completed successfully. Ensure you
update both occurrences (the loop using phases/phaseStatus and the similar block
at lines ~630-649) and reference findFailedPhase, phaseStatus, phases, result,
execErr and classifyFailure when making the change.
- Around line 184-190: The code currently calls
recordHistoryWithPhaseStatus(..., expectedSeq=0) and discards the error,
allowing stale requests to append FAILED/PARTIAL rows; change this to use the
same optimistic-sequence logic as the success path: pass the correct expected
sequence (the same sequence value used when writing successful history) instead
of 0, capture the returned error from recordHistoryWithPhaseStatus, and handle
ErrSequenceConflict the same way the success path does (i.e., detect sequence
conflict and return/translate it to the same conflict response or retry logic).
Locate recordHistoryWithPhaseStatus, classifyFailure, and the
execResult.jobID/phaseStatus usage to implement the change so failed/partial
history writes respect optimistic locking and surface the error.

In `@services/control-plane/internal/manifest/history.go`:
- Around line 281-285: The code currently swallows errors from
entity.GetPhaseStatus() causing malformed phase_status to be omitted; change the
block in EntityToProto (or the function where this snippet lives) to propagate
the error instead of ignoring it: if entity.GetPhaseStatus() returns a non-nil
err, wrap it with context (e.g., "invalid phase_status for manifest <id>" or
similar) and return the error from EntityToProto (or bubble it up) so callers
see the failure; preserve the existing mv.PhaseStatus assignment when err == nil
and phaseStatus != nil.
- Around line 152-157: The code currently ignores errors from
s.repo.UpdatePhaseStatus after calling entity.SetPhaseStatus, resulting in a
false success (PARTIAL) even when phase_status persistence fails; modify the
logic in the function that calls SetPhaseStatus and s.repo.UpdatePhaseStatus so
that you capture the error returned by s.repo.UpdatePhaseStatus and propagate it
(or convert it to an appropriate failure result) instead of discarding it: call
setErr := entity.SetPhaseStatus(phaseStatus); if setErr == nil { if updErr :=
s.repo.UpdatePhaseStatus(ctx, entity.ID, entity.PhaseStatus); updErr != nil {
return updErr (or mark snapshot as failed) } } so callers observe persistence
failures from UpdatePhaseStatus rather than a silent success.

---

Nitpick comments:
In `@services/control-plane/internal/manifest/repository.go`:
- Around line 79-105: The UpdatePhaseStatus method should operate on the typed
PhaseStatusMap rather than raw JSON: change UpdatePhaseStatus to accept/return
PhaseStatusMap (not []byte/string), call VersionEntity.GetPhaseStatus to read
current map, mutate it, then use VersionEntity.SetPhaseStatus to serialize and
store it, avoiding direct json.Marshal/json.Unmarshal in the repo layer; update
any callers to pass a PhaseStatusMap and remove any duplicated marshaling logic
(also apply the same change to the other occurrence referenced around lines
264-276).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bedc0ac2-0bc5-469c-bbc7-040a889fac1a

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9c8f8 and f474248.

⛔ Files ignored due to path filters (1)

services/control-plane/migrations/atlas.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

api/proto/meridian/control_plane/v1/apply_manifest_service.proto
api/proto/meridian/control_plane/v1/manifest_history_service.proto
services/control-plane/internal/applier/executor.go
services/control-plane/internal/applier/grpc_handler.go
services/control-plane/internal/applier/grpc_handler_test.go
services/control-plane/internal/manifest/history.go
services/control-plane/internal/manifest/repository.go
services/control-plane/internal/manifest/repository_test.go
services/control-plane/migrations/20260316000002_add_phase_status.sql
services/control-plane/migrations/20260316000003_add_partial_apply_status.sql

coderabbitai · 2026-03-16T10:05:35Z

+		// Determine if this is a partial failure (some phases succeeded)
+		applyStatus, responseStatus := classifyFailure(execResult.phaseStatus)
+		response.Status = responseStatus
+		response.PhaseStatus = phaseStatusMapToResponseProto(execResult.phaseStatus)
+
+		// Record history with phase status
+		_, _ = h.recordHistoryWithPhaseStatus(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil, 0, execResult.phaseStatus)


⚠️ Potential issue | 🟠 Major

Don't bypass optimistic locking on failed/partial history writes.

Line 190 hardcodes expectedSeq=0 and drops the returned error, so a stale request can still append a FAILED/PARTIAL history row after another manifest version has already advanced the sequence. This should use the same sequence check and ErrSequenceConflict handling as the success path.

Suggested fix

- _, _ = h.recordHistoryWithPhaseStatus(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil, 0, execResult.phaseStatus) + if _, err := h.recordHistoryWithPhaseStatus( + ctx, + req.GetManifest(), + req.GetAppliedBy(), + execResult.jobID, + applyStatus, + nil, + req.GetExpectedSequenceNumber(), + execResult.phaseStatus, + ); err != nil && errors.Is(err, manifest.ErrSequenceConflict) { + return nil, status.Errorf(codes.Aborted, "%v", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Determine if this is a partial failure (some phases succeeded)

applyStatus, responseStatus := classifyFailure(execResult.phaseStatus)

response.Status = responseStatus

response.PhaseStatus = phaseStatusMapToResponseProto(execResult.phaseStatus)

// Record history with phase status

_, _ = h.recordHistoryWithPhaseStatus(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil, 0, execResult.phaseStatus)

// Determine if this is a partial failure (some phases succeeded)

applyStatus, responseStatus := classifyFailure(execResult.phaseStatus)

response.Status = responseStatus

response.PhaseStatus = phaseStatusMapToResponseProto(execResult.phaseStatus)

// Record history with phase status

if _, err := h.recordHistoryWithPhaseStatus(

ctx,

req.GetManifest(),

req.GetAppliedBy(),

execResult.jobID,

applyStatus,

nil,

req.GetExpectedSequenceNumber(),

execResult.phaseStatus,

); err != nil && errors.Is(err, manifest.ErrSequenceConflict) {

return nil, status.Errorf(codes.Aborted, "%v", err)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/control-plane/internal/applier/grpc_handler.go` around lines 184 - 190, The code currently calls recordHistoryWithPhaseStatus(..., expectedSeq=0) and discards the error, allowing stale requests to append FAILED/PARTIAL rows; change this to use the same optimistic-sequence logic as the success path: pass the correct expected sequence (the same sequence value used when writing successful history) instead of 0, capture the returned error from recordHistoryWithPhaseStatus, and handle ErrSequenceConflict the same way the success path does (i.e., detect sequence conflict and return/translate it to the same conflict response or retry logic). Locate recordHistoryWithPhaseStatus, classifyFailure, and the execResult.jobID/phaseStatus usage to implement the change so failed/partial history writes respect optimistic locking and surface the error.

coderabbitai · 2026-03-16T10:05:35Z

+	// On failure: mark phases based on step results.
+	// Steps are executed in phase order. We mark phases as COMPLETED until we
+	// find a failed step, then the containing phase is FAILED and the rest are SKIPPED.
+	failedPhase := findFailedPhase(plan, result)
+
+	for _, p := range phases {
+		key := fmt.Sprintf("phase_%d", p)
+		entry := phaseStatus[key]
+		entry.StartedAt = &now
+
+		if failedPhase > 0 && p < failedPhase {
+			entry.Status = manifest.PhaseStatusCompleted
+			entry.CompletedAt = &now
+		} else if failedPhase > 0 && p == failedPhase {
+			entry.Status = manifest.PhaseStatusFailed
+			entry.CompletedAt = &now
+			if result != nil {
+				entry.Error = result.Error
+			} else if execErr != nil {
+				entry.Error = execErr.Error()
+			}
+		} else if failedPhase > 0 {
+			entry.Status = manifest.PhaseStatusSkipped
+			entry.StartedAt = nil
+		} else {
+			// No phase-level info available; mark all as failed
+			entry.Status = manifest.PhaseStatusFailed
+			entry.CompletedAt = &now
+			if execErr != nil {
+				entry.Error = execErr.Error()
+			}
+		}
+		phaseStatus[key] = entry
+	}


⚠️ Potential issue | 🟠 Major

Partial work inside the first failed phase is still reported as FAILED.

A phase can contain multiple calls. If some calls in phase 1 succeed and a later call in that same phase fails, updatePhaseStatus records only FAILED for phase_1, so classifyFailure returns FAILED because no phase is COMPLETED. That loses the "side effects already happened" signal this PR is adding.

Also applies to: 630-649

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/control-plane/internal/applier/grpc_handler.go` around lines 526 - 559, The current logic marks the entire failed phase as FAILED even if earlier calls in that same phase succeeded; modify the update that runs when failedPhase == p (in the loop that updates phaseStatus using findFailedPhase, and the analogous block around classifyFailure/updatePhaseStatus later) to inspect the step-level results (e.g., result.StepResults or whatever structure holds per-step statuses) for the failedPhase: if any step in that phase succeeded, treat the phase as having partial work — set the phase status to manifest.PhaseStatusCompleted (or to an existing "partial" status if your manifest has one), set CompletedAt = &now and populate Error from result.Error or execErr as currently done; only mark it strictly FAILED when no step in that phase completed successfully. Ensure you update both occurrences (the loop using phases/phaseStatus and the similar block at lines ~630-649) and reference findFailedPhase, phaseStatus, phases, result, execErr and classifyFailure when making the change.

claude · 2026-03-16T10:07:03Z

Claude Code Review

Commit: 5d1d3a5 | CI: running

Summary

Well-structured PR that adds per-phase execution tracking with proper CockroachDB migration splitting (ADD COLUMN / DROP CONSTRAINT / ADD CONSTRAINT across 3 files). The domain types, proto definitions, and test coverage are solid. Two areas worth addressing: duplicate conversion logic and a fragile positional correlation assumption in findFailedPhase.

Risk Assessment

Area	Level	Detail
Blast radius	Low	Failure path only gains richer metadata; success path unchanged except adding phase_status to response
Rollback	Safe	New column is nullable JSONB, new enum value is additive; old code ignores both
Scale	Low	Phase map is bounded by number of phases (currently max 11); no unbounded structures
Cross-system	Low	Proto additions are additive (new field + new enum value), backward compatible
Migration	Safe	Three separate migrations respect CockroachDB constraint-rename limitation; `atlas.sum` updated

Findings

Severity	Location	Description	Status
Improvement	`grpc_handler.go` / `history.go`	Duplicate proto conversion functions (`phaseStatusMapToResponseProto` vs `phaseStatusMapToProto`)	Open
Improvement	`grpc_handler.go:559`	`findFailedPhase` relies on positional correlation between step results and planned calls - document the invariant or add a guard	Open
Improvement	`grpc_handler.go` + `history.go`	`recordHistoryWithPhaseStatus` / `StoreManifestVersionWithPhaseStatus` duplicate their base methods; consider adding optional `PhaseStatusMap` parameter	Open

Bot Review Notes

CodeRabbit thread 1 (grpc_handler.go:190 - expectedSeq=0 on failure path): This is intentional and correct. Failed/partial applies record audit history unconditionally - they should not be blocked by sequence conflicts from concurrent successful applies. The dropped error follows the existing pattern where history recording is non-fatal. No action needed.

CodeRabbit thread 2 (grpc_handler.go:561 - intra-phase partial work): Valid observation but by design. This PR tracks per-phase status, not per-step. A phase with 3 calls where call 3 fails is correctly marked FAILED at the phase level. Sub-phase granularity would be a separate enhancement. The only scenario where this matters for classifyFailure is if the first phase fails (no prior completed phases), which correctly returns FAILED rather than PARTIAL. No action needed for this PR's scope.

claude

See summary comment. 1 blocking finding on findFailedPhase name mismatch, 1 suggestion on duplicate proto conversion.

claude · 2026-03-16T10:07:41Z

+			entry.StartedAt = nil
+		} else {
+			// No phase-level info available; mark all as failed
+			entry.Status = manifest.PhaseStatusFailed
+			entry.CompletedAt = &now
+			if execErr != nil {
+				entry.Error = execErr.Error()
+			}
+		}
+		phaseStatus[key] = entry
+	}
+}


MUST FIX: findFailedPhase builds a map keyed by string(call.GRPCMethod) (fully-qualified gRPC paths like meridian.reference_data.v1.ReferenceDataService/RegisterInstrument) but matches against saga.StepResult.StepName which uses handler registry names (like reference_data.register_instrument).

These will never match in production, so findFailedPhase always returns 0, updatePhaseStatus marks all phases as FAILED (the failedPhase == 0 else-branch), and classifyFailure never sees both COMPLETED and FAILED entries. The PARTIAL status is unreachable.

The unit tests mask this because they use matching short names for both GRPCMethod and StepName.

Suggested fix: Add a HandlerName field to PlannedCall that stores the registry name (e.g., reference_data.register_instrument), then use that field as the map key here instead of GRPCMethod. Also update tests to use realistic production names to prevent regression.

- Fix findFailedPhase to use positional correlation instead of name matching. Saga step results use handler names (e.g. "reference_data.register_instrument") while execution plan calls use gRPC method paths. Positional index is the correct mapping. - Fix EntityToProto to propagate phase_status deserialization errors instead of silently dropping them. - Rewrite StoreManifestVersionWithPhaseStatus to include phase_status in the initial Store call (single atomic write) instead of a separate UpdatePhaseStatus call that could fail silently. - Remove misleading "partial" from executor Status field comment since partial classification happens in the gRPC handler, not the executor.

CockroachDB cannot drop and recreate a constraint with the same name in a single transaction. Split the apply_status constraint change into two migrations: one to drop the old constraint, one to add the new.

coderabbitai

♻️ Duplicate comments (1)

services/control-plane/internal/applier/grpc_handler.go (1)

184-191: ⚠️ Potential issue | 🟠 Major

Optimistic locking is bypassed for failed/partial history writes.

Line 190 hardcodes expectedSeq=0 and discards both return values, allowing stale requests to append FAILED/PARTIAL history rows even after another manifest version has already advanced the sequence. This should use the same sequence check as the success path (line 197).

🔧 Proposed fix

-		_, _ = h.recordHistoryWithPhaseStatus(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil, 0, execResult.phaseStatus)
+		if _, err := h.recordHistoryWithPhaseStatus(
+			ctx,
+			req.GetManifest(),
+			req.GetAppliedBy(),
+			execResult.jobID,
+			applyStatus,
+			nil,
+			req.GetExpectedSequenceNumber(),
+			execResult.phaseStatus,
+		); err != nil && errors.Is(err, manifest.ErrSequenceConflict) {
+			return nil, status.Errorf(codes.Aborted, "%v", err)
+		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/applier/grpc_handler.go` around lines 184 -
191, The history write on failure currently calls
recordHistoryWithPhaseStatus(..., expectedSeq=0) and discards its results, which
bypasses optimistic-lock checks; change that call to use the same expected
sequence value used by the success path (the sequence variable passed to the
success recordHistoryWithPhaseStatus call) and handle/propagate the returned
(seq, err) instead of ignoring them so that stale requests cannot append
FAILED/PARTIAL rows when the manifest sequence has advanced; locate the call to
recordHistoryWithPhaseStatus and replace the hardcoded 0 with the shared
expectedSeq variable (and check the returned error/sequence like the success
path does).

🧹 Nitpick comments (2)

services/control-plane/internal/applier/grpc_handler.go (1)

652-672: Duplicate conversion logic with manifest.phaseStatusMapToProto.

This function duplicates the logic in services/control-plane/internal/manifest/history.go:336-356. Consider exposing the manifest package's phaseStatusMapToProto and reusing it here, or extracting to a shared location.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/applier/grpc_handler.go` around lines 652 -
672, The function phaseStatusMapToResponseProto duplicates
manifest.phaseStatusMapToProto; update this code to reuse the shared conversion
by either exporting manifest.phaseStatusMapToProto (rename to
PhaseStatusMapToProto) or moving the conversion into a shared helper used by
both packages, then replace phaseStatusMapToResponseProto’s body with a call to
the shared function (preserving the target type controlplanev1.PhaseStatusDetail
if necessary by adapting the shared helper return type or adding a thin
adapter). Ensure references to manifest.PhaseStatusMap and the existing
manifest.phaseStatusMapToProto/PhaseStatusMapToProto symbol are used so logic is
not duplicated.

services/control-plane/internal/manifest/history.go (1)

135-204: Consider extracting shared logic to reduce duplication.

StoreManifestVersionWithPhaseStatus duplicates most of the logic from StoreManifestVersion (manifest marshaling, diff summary generation, graph serialization, entity creation). Consider refactoring to a shared internal helper that both methods call, passing phaseStatus as an optional parameter.

♻️ Suggested approach

+// storeVersionInternal contains the shared logic for storing manifest versions.
+func (s *HistoryService) storeVersionInternal(
+	ctx context.Context,
+	mf *controlplanev1.Manifest,
+	appliedBy string,
+	applyJobID *uuid.UUID,
+	applyStatus ApplyStatus,
+	graph *validator.RelationshipGraph,
+	expectedSeq int64,
+	phaseStatus PhaseStatusMap,
+) (*VersionEntity, error) {
+	// ... shared marshaling, diff, graph, entity creation logic ...
+	if phaseStatus != nil {
+		if err := entity.SetPhaseStatus(phaseStatus); err != nil {
+			return nil, fmt.Errorf("failed to serialize phase_status: %w", err)
+		}
+	}
+	// ... store and return ...
+}
+
 func (s *HistoryService) StoreManifestVersion(...) (*VersionEntity, error) {
-	// ... 60+ lines of duplicated logic ...
+	return s.storeVersionInternal(ctx, manifest, appliedBy, applyJobID, status, graph, expectedSeq, nil)
 }
 
 func (s *HistoryService) StoreManifestVersionWithPhaseStatus(...) (*VersionEntity, error) {
-	// ... 60+ lines of duplicated logic ...
+	return s.storeVersionInternal(ctx, mf, appliedBy, applyJobID, applyStatus, graph, expectedSeq, phaseStatus)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/manifest/history.go` around lines 135 - 204,
StoreManifestVersionWithPhaseStatus duplicates the manifest marshaling, diff
summary generation, relationship-graph JSON serialization, and VersionEntity
construction found in StoreManifestVersion; refactor by extracting a shared
helper (e.g., prepareVersionEntity or buildVersionEntity) that accepts the
Manifest (mf), appliedBy, applyJobID, applyStatus, graph, expectedSeq, and an
optional PhaseStatusMap, performs the protojson.Marshal, calls
generateDiffSummary when needed, serializes the graph, constructs and returns a
fully populated VersionEntity (or an entity plus any error), and then have both
StoreManifestVersion and StoreManifestVersionWithPhaseStatus call that helper
and only handle repo.Store and return; ensure SetPhaseStatus remains invoked
inside the helper when a non-nil phaseStatus is provided and preserve existing
error handling and semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 184-191: The history write on failure currently calls
recordHistoryWithPhaseStatus(..., expectedSeq=0) and discards its results, which
bypasses optimistic-lock checks; change that call to use the same expected
sequence value used by the success path (the sequence variable passed to the
success recordHistoryWithPhaseStatus call) and handle/propagate the returned
(seq, err) instead of ignoring them so that stale requests cannot append
FAILED/PARTIAL rows when the manifest sequence has advanced; locate the call to
recordHistoryWithPhaseStatus and replace the hardcoded 0 with the shared
expectedSeq variable (and check the returned error/sequence like the success
path does).

---

Nitpick comments:
In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 652-672: The function phaseStatusMapToResponseProto duplicates
manifest.phaseStatusMapToProto; update this code to reuse the shared conversion
by either exporting manifest.phaseStatusMapToProto (rename to
PhaseStatusMapToProto) or moving the conversion into a shared helper used by
both packages, then replace phaseStatusMapToResponseProto’s body with a call to
the shared function (preserving the target type controlplanev1.PhaseStatusDetail
if necessary by adapting the shared helper return type or adding a thin
adapter). Ensure references to manifest.PhaseStatusMap and the existing
manifest.phaseStatusMapToProto/PhaseStatusMapToProto symbol are used so logic is
not duplicated.

In `@services/control-plane/internal/manifest/history.go`:
- Around line 135-204: StoreManifestVersionWithPhaseStatus duplicates the
manifest marshaling, diff summary generation, relationship-graph JSON
serialization, and VersionEntity construction found in StoreManifestVersion;
refactor by extracting a shared helper (e.g., prepareVersionEntity or
buildVersionEntity) that accepts the Manifest (mf), appliedBy, applyJobID,
applyStatus, graph, expectedSeq, and an optional PhaseStatusMap, performs the
protojson.Marshal, calls generateDiffSummary when needed, serializes the graph,
constructs and returns a fully populated VersionEntity (or an entity plus any
error), and then have both StoreManifestVersion and
StoreManifestVersionWithPhaseStatus call that helper and only handle repo.Store
and return; ensure SetPhaseStatus remains invoked inside the helper when a
non-nil phaseStatus is provided and preserve existing error handling and
semantics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 40e51b84-28c5-4b78-bf0e-e9ca2ce074e4

📥 Commits

Reviewing files that changed from the base of the PR and between f474248 and 7a76cdf.

⛔ Files ignored due to path filters (1)

services/control-plane/migrations/atlas.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

services/control-plane/internal/applier/grpc_handler.go
services/control-plane/internal/applier/grpc_handler_test.go
services/control-plane/internal/manifest/history.go
services/control-plane/migrations/20260316000003_add_partial_apply_status.sql
services/control-plane/migrations/20260316000004_add_partial_apply_status_constraint.sql

claude

All critical findings from the previous review are resolved. One non-blocking code quality suggestion (duplicate function). See summary comment for full details and bot thread assessments.

codecov · 2026-03-16T10:30:45Z

Codecov Report

❌ Patch coverage is 47.20812% with 104 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ervices/control-plane/internal/manifest/history.go	1.66%	57 Missing and 2 partials ⚠️
...ces/control-plane/internal/applier/grpc_handler.go	70.53%	31 Missing and 2 partials ⚠️
...ices/control-plane/internal/manifest/repository.go	52.00%	10 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Stale review: issues addressed in commits d47e28d and 7a76cdf

Keep both UpdatePhaseStatus (this branch) and GetBySequenceNumber (from develop). Regenerate proto files for merged proto changes.

claude

See summary comment. 3 inline suggestions.

claude · 2026-03-16T11:03:20Z

+
+// phaseStatusMapToResponseProto converts a PhaseStatusMap to proto map for the response.
+func phaseStatusMapToResponseProto(ps manifest.PhaseStatusMap) map[string]*controlplanev1.PhaseStatusDetail {
+	if ps == nil {


Suggestion: phaseStatusMapToResponseProto here is identical to phaseStatusMapToProto in history.go. Two copies of the same conversion logic will drift. Export PhaseStatusMapToProto from the manifest package and call it from both places.

claude · 2026-03-16T11:03:20Z

+func findFailedPhase(plan *planner.ExecutionPlan, result *ApplyManifestResult) planner.Phase {
+	if result == nil || len(result.StepResults) == 0 {
+		return 0
+	}
+
+	for i, step := range result.StepResults {
+		if !step.Success {
+			if i < len(plan.Calls) {
+				return plan.Calls[i].Phase


Suggestion: The positional correlation between result.StepResults and plan.Calls is correct today (the executor runs calls sequentially in plan order), but this invariant is implicit. Consider adding a brief comment explaining why positional correlation holds so future editors know this is load-bearing.

claude · 2026-03-16T11:03:20Z

+// StoreManifestVersionWithPhaseStatus saves a manifest snapshot with phase-level execution status.
+// Phase status is included in the initial entity creation (single atomic write).
+func (s *HistoryService) StoreManifestVersionWithPhaseStatus(
+	ctx context.Context,
+	mf *controlplanev1.Manifest,
+	appliedBy string,
+	applyJobID *uuid.UUID,
+	applyStatus ApplyStatus,
+	graph *validator.RelationshipGraph,
+	expectedSeq int64,
+	phaseStatus PhaseStatusMap,
+) (*VersionEntity, error) {
+	if mf == nil {
+		return nil, ErrNilManifest
+	}
+	if appliedBy == "" {
+		return nil, ErrEmptyAppliedBy
+	}
+
+	marshaler := protojson.MarshalOptions{UseProtoNames: true}
+	jsonBytes, err := marshaler.Marshal(mf)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal manifest to JSON: %w", err)
+	}
+
+	var diffSummary *string
+	if applyStatus == ApplyStatusApplied {
+		summary, diffErr := s.generateDiffSummary(ctx, mf)
+		if diffErr != nil && !errors.Is(diffErr, ErrVersionNotFound) {
+			return nil, fmt.Errorf("failed to generate diff summary: %w", diffErr)
+		}
+		if summary != "" {
+			diffSummary = &summary
+		}
+	}
+
+	var graphJSON *string
+	if graph != nil {
+		if graphBytes, graphErr := json.Marshal(graph); graphErr == nil {
+			s := string(graphBytes)
+			graphJSON = &s
+		}
+	}
+
+	now := time.Now().UTC()
+	entity := &VersionEntity{
+		ID:                uuid.New(),
+		Version:           mf.Version,
+		ManifestJSON:      string(jsonBytes),
+		AppliedAt:         now,
+		AppliedBy:         appliedBy,
+		ApplyStatus:       applyStatus,
+		ApplyJobID:        applyJobID,
+		DiffSummary:       diffSummary,
+		RelationshipGraph: graphJSON,
+		CreatedAt:         now,
+	}
+
+	if phaseStatus != nil {
+		if err := entity.SetPhaseStatus(phaseStatus); err != nil {
+			return nil, fmt.Errorf("failed to serialize phase_status: %w", err)
+		}
+	}
+
+	if err := s.repo.Store(ctx, entity, expectedSeq); err != nil {
+		return nil, err
+	}
+
+	return entity, nil
+}


Suggestion: StoreManifestVersionWithPhaseStatus is a near-identical copy of StoreManifestVersion (75 lines) differing only by the SetPhaseStatus call. Consider extracting shared entity construction into a private buildVersionEntity helper.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/control-plane/internal/applier/grpc_handler.go (1)

194-216: ⚠️ Potential issue | 🟠 Major

Persist phase_status on successful applies too.

The success path still writes history through recordHistory, so the stored manifest_versions row never gets phase_status. services/control-plane/internal/manifest/history.go already added StoreManifestVersionWithPhaseStatus, and EntityToProto only emits phase data when it was persisted, so later history reads will lose the phase status you just returned in the apply response.

Suggested fix

-	snapshot, seqErr := h.recordHistory(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, manifest.ApplyStatusApplied, validationResult.graph, expectedSeq)
+	snapshot, seqErr := h.recordHistoryWithPhaseStatus(
+		ctx,
+		req.GetManifest(),
+		req.GetAppliedBy(),
+		execResult.jobID,
+		manifest.ApplyStatusApplied,
+		validationResult.graph,
+		expectedSeq,
+		execResult.phaseStatus,
+	)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/applier/grpc_handler.go` around lines 194 -
216, The manifest version row isn't persisting phase_status on successful
applies; after the successful call to recordHistory in ApplyManifest handler,
call the new persistence method that includes phase info
(StoreManifestVersionWithPhaseStatus) on the version store instead of the
current Save so the stored manifest_versions row records execResult.phaseStatus;
locate the block around recordHistory and the subsequent h.versionStore.Save
call in grpc_handler.go and replace or augment that save with a call to the
versionStore's StoreManifestVersionWithPhaseStatus (passing req.GetManifest(),
req.GetAppliedBy(), and execResult.phaseStatus) so future history reads via
EntityToProto will include the persisted phase_status.

♻️ Duplicate comments (1)

services/control-plane/internal/applier/grpc_handler.go (1)

184-190: ⚠️ Potential issue | 🟠 Major

Use the request sequence on failed/partial history writes.

This still hardcodes expectedSeq=0 and drops recordHistoryWithPhaseStatus’s error. A stale apply can therefore append FAILED/PARTIAL history after the sequence has advanced, and ErrSequenceConflict never reaches the client.

Suggested fix

-		_, _ = h.recordHistoryWithPhaseStatus(ctx, req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil, 0, execResult.phaseStatus)
+		if _, seqErr := h.recordHistoryWithPhaseStatus(
+			ctx,
+			req.GetManifest(),
+			req.GetAppliedBy(),
+			execResult.jobID,
+			applyStatus,
+			nil,
+			req.GetExpectedSequenceNumber(),
+			execResult.phaseStatus,
+		); seqErr != nil {
+			logger.Warn("sequence number conflict during history recording", "error", seqErr)
+			return nil, status.Errorf(codes.Aborted, "%v", seqErr)
+		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/applier/grpc_handler.go` around lines 184 -
190, The history write currently hardcodes expectedSeq=0 and discards errors
from recordHistoryWithPhaseStatus; change the call that writes history to use
the request sequence (e.g., req.GetSequence()) instead of 0, capture the
returned error from recordHistoryWithPhaseStatus(ctx, req.GetManifest(),
req.GetAppliedBy(), execResult.jobID, applyStatus, nil, req.GetSequence(),
execResult.phaseStatus), and propagate sequence conflicts and other errors back
to the caller (detect ErrSequenceConflict and convert/return it appropriately)
rather than ignoring them so stale applies cannot append FAILED/PARTIAL history.

🧹 Nitpick comments (1)

services/control-plane/internal/manifest/history.go (1)

137-204: Extract the common snapshot-building logic.

StoreManifestVersion and StoreManifestVersionWithPhaseStatus now duplicate manifest marshaling, diff-summary generation, graph serialization, and VersionEntity construction. Pulling that into one helper will make future field or validation changes land in one place instead of drifting between the two write paths.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/control-plane/internal/manifest/history.go` around lines 137 - 204,
StoreManifestVersion and StoreManifestVersionWithPhaseStatus duplicate manifest
marshaling, diff-summary generation, graph serialization, and VersionEntity
construction; extract that logic into a single helper (e.g., buildVersionEntity
or buildSnapshot) that accepts (ctx, mf *controlplanev1.Manifest, appliedBy
string, applyJobID *uuid.UUID, applyStatus ApplyStatus, graph
*validator.RelationshipGraph, now time.Time, phaseStatus PhaseStatusMap) and
returns (*VersionEntity, error). Move protojson marshaling, generateDiffSummary
(only when applyStatus == ApplyStatusApplied and preserving ErrVersionNotFound
handling), graph JSON serialization, and the VersionEntity field population into
that helper, and call entity.SetPhaseStatus inside the helper when phaseStatus
!= nil; then have both StoreManifestVersion and
StoreManifestVersionWithPhaseStatus call the helper and only keep repo.Store and
expectedSeq handling in the callers to preserve current semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 512-560: The code in updatePhaseStatus (after executor.Apply)
currently backfills StartedAt/CompletedAt with a single synthetic now for every
phase (variables: phases, phaseStatus, findFailedPhase, result, execErr), which
violates the manifest timing contract; instead, stop setting entry.StartedAt and
entry.CompletedAt to &now unless you can source real per-phase timings from the
executor result (e.g., per-step/phase timestamps on result or StepResults) and
copy those into entry.StartedAt/CompletedAt; otherwise, only set entry.Status
and entry.Error as appropriate and leave StartedAt/CompletedAt nil so inaccurate
timestamps are not recorded.

---

Outside diff comments:
In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 194-216: The manifest version row isn't persisting phase_status on
successful applies; after the successful call to recordHistory in ApplyManifest
handler, call the new persistence method that includes phase info
(StoreManifestVersionWithPhaseStatus) on the version store instead of the
current Save so the stored manifest_versions row records execResult.phaseStatus;
locate the block around recordHistory and the subsequent h.versionStore.Save
call in grpc_handler.go and replace or augment that save with a call to the
versionStore's StoreManifestVersionWithPhaseStatus (passing req.GetManifest(),
req.GetAppliedBy(), and execResult.phaseStatus) so future history reads via
EntityToProto will include the persisted phase_status.

---

Duplicate comments:
In `@services/control-plane/internal/applier/grpc_handler.go`:
- Around line 184-190: The history write currently hardcodes expectedSeq=0 and
discards errors from recordHistoryWithPhaseStatus; change the call that writes
history to use the request sequence (e.g., req.GetSequence()) instead of 0,
capture the returned error from recordHistoryWithPhaseStatus(ctx,
req.GetManifest(), req.GetAppliedBy(), execResult.jobID, applyStatus, nil,
req.GetSequence(), execResult.phaseStatus), and propagate sequence conflicts and
other errors back to the caller (detect ErrSequenceConflict and convert/return
it appropriately) rather than ignoring them so stale applies cannot append
FAILED/PARTIAL history.

---

Nitpick comments:
In `@services/control-plane/internal/manifest/history.go`:
- Around line 137-204: StoreManifestVersion and
StoreManifestVersionWithPhaseStatus duplicate manifest marshaling, diff-summary
generation, graph serialization, and VersionEntity construction; extract that
logic into a single helper (e.g., buildVersionEntity or buildSnapshot) that
accepts (ctx, mf *controlplanev1.Manifest, appliedBy string, applyJobID
*uuid.UUID, applyStatus ApplyStatus, graph *validator.RelationshipGraph, now
time.Time, phaseStatus PhaseStatusMap) and returns (*VersionEntity, error). Move
protojson marshaling, generateDiffSummary (only when applyStatus ==
ApplyStatusApplied and preserving ErrVersionNotFound handling), graph JSON
serialization, and the VersionEntity field population into that helper, and call
entity.SetPhaseStatus inside the helper when phaseStatus != nil; then have both
StoreManifestVersion and StoreManifestVersionWithPhaseStatus call the helper and
only keep repo.Store and expectedSeq handling in the callers to preserve current
semantics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 222da673-9cff-4b18-8b76-9b304b2514ca

📥 Commits

Reviewing files that changed from the base of the PR and between 7a76cdf and 5d1d3a5.

📒 Files selected for processing (5)

api/proto/meridian/control_plane/v1/apply_manifest_service.proto
api/proto/meridian/control_plane/v1/manifest_history_service.proto
services/control-plane/internal/applier/grpc_handler.go
services/control-plane/internal/manifest/history.go
services/control-plane/internal/manifest/repository.go

🚧 Files skipped from review as they are similar to previous changes (1)

api/proto/meridian/control_plane/v1/apply_manifest_service.proto

coderabbitai · 2026-03-16T11:03:20Z

+	now := time.Now().UTC()
+	phases := plan.Phases()
+
+	if execErr == nil && result != nil {
+		// All phases completed successfully
+		for _, p := range phases {
+			key := fmt.Sprintf("phase_%d", p)
+			entry := phaseStatus[key]
+			entry.Status = manifest.PhaseStatusCompleted
+			entry.StartedAt = &now
+			entry.CompletedAt = &now
+			phaseStatus[key] = entry
+		}
+		return
+	}
+
+	// On failure: mark phases based on step results.
+	// Steps are executed in phase order. We mark phases as COMPLETED until we
+	// find a failed step, then the containing phase is FAILED and the rest are SKIPPED.
+	failedPhase := findFailedPhase(plan, result)
+
+	for _, p := range phases {
+		key := fmt.Sprintf("phase_%d", p)
+		entry := phaseStatus[key]
+		entry.StartedAt = &now
+
+		if failedPhase > 0 && p < failedPhase {
+			entry.Status = manifest.PhaseStatusCompleted
+			entry.CompletedAt = &now
+		} else if failedPhase > 0 && p == failedPhase {
+			entry.Status = manifest.PhaseStatusFailed
+			entry.CompletedAt = &now
+			if result != nil {
+				entry.Error = result.Error
+			} else if execErr != nil {
+				entry.Error = execErr.Error()
+			}
+		} else if failedPhase > 0 {
+			entry.Status = manifest.PhaseStatusSkipped
+			entry.StartedAt = nil
+		} else {
+			// No phase-level info available; mark all as failed
+			entry.Status = manifest.PhaseStatusFailed
+			entry.CompletedAt = &now
+			if execErr != nil {
+				entry.Error = execErr.Error()
+			}
+		}
+		phaseStatus[key] = entry


⚠️ Potential issue | 🟠 Major

Don’t backfill phase timings with one post-run timestamp.

updatePhaseStatus runs only after executor.Apply returns, so every executed phase gets the same synthetic now. That does not match the started_at / completed_at contract added in api/proto/meridian/control_plane/v1/manifest_history_service.proto and will make the stored phase timing misleading. Either thread real per-phase timing out of the executor or leave these fields unset until that data exists.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/control-plane/internal/applier/grpc_handler.go` around lines 512 - 560, The code in updatePhaseStatus (after executor.Apply) currently backfills StartedAt/CompletedAt with a single synthetic now for every phase (variables: phases, phaseStatus, findFailedPhase, result, execErr), which violates the manifest timing contract; instead, stop setting entry.StartedAt and entry.CompletedAt to &now unless you can source real per-phase timings from the executor result (e.g., per-step/phase timestamps on result or StepResults) and copy those into entry.StartedAt/CompletedAt; otherwise, only set entry.Status and entry.Error as appropriate and leave StartedAt/CompletedAt nil so inaccurate timestamps are not recorded.

Design suggestions acknowledged - timestamps represent recording time, optimistic locking on failure path is pre-existing behavior, and code dedup is follow-up work

coderabbitai Bot previously requested changes Mar 16, 2026

View reviewed changes

claude Bot previously requested changes Mar 16, 2026

View reviewed changes

claude Bot reviewed Mar 16, 2026

View reviewed changes

bjcoombs added 2 commits March 16, 2026 10:10

fix: Split CHECK constraint migration for CockroachDB compatibility

7a76cdf

CockroachDB cannot drop and recreate a constraint with the same name in a single transaction. Split the apply_status constraint change into two migrations: one to drop the old constraint, one to add the new.

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

claude Bot approved these changes Mar 16, 2026

View reviewed changes

chore: Merge origin/develop and resolve conflict in repository.go

5d1d3a5

Keep both UpdatePhaseStatus (this branch) and GetBySequenceNumber (from develop). Regenerate proto files for merged proto changes.

claude Bot reviewed Mar 16, 2026

View reviewed changes

coderabbitai Bot previously requested changes Mar 16, 2026

View reviewed changes

bjcoombs merged commit eb0de06 into develop Mar 16, 2026
54 checks passed

bjcoombs deleted the 045-manifest-source-of-truth--11--phase-status-partial branch March 16, 2026 11:19

Conversation

bjcoombs commented Mar 16, 2026

Summary

Changes

Migrations (CockroachDB-safe)

Proto

Domain (manifest package)

gRPC Handler (applier package)

Task Master

Test plan

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Risk Assessment

Findings

Bot Review Notes

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Mar 16, 2026

Codecov Report

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

claude Bot commented Mar 16, 2026 •

edited

Loading